Detection of Plagiarism in Arabic Documents
نویسنده
چکیده
Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Languageindependent tools exist as well, but are considered restrictive as they usually do not take into account specific language features. Detecting plagiarism in Arabic documents is particularly a challenging task because of the complex linguistic structure of Arabic. In this paper, we present a plagiarism detection tool for comparison of Arabic documents to identify potential similarities. The tool is based on a new comparison algorithm that uses heuristics to compare suspect documents at different hierarch ical levels to avoid unnecessary comparisons. We evaluate its performance in terms of precision and recall on a large data set of Arabic documents, and show its capability in identifying direct and sophisticated copying, such as sentence reordering and synonym substitution. We also demonstrate its advantages over other plagiarism detection tools, including Turnitin, the well-known language-independent tool.
منابع مشابه
Hybrid Segmentation Prototype for Arabic Text-Based Documents: Towards Plagiarism Detection
The contribution of this work relates to the field of Arabic text-based document analysis for the detection of plagiarism. This analysis will be carried out according to the triadic computation model of document similarity. The authors propose a hybrid segmentation prototype for Arabic text-based documents that links different processing steps in order to generate the similarity rate between th...
متن کاملA New Corpus for the Evaluation of Arabic Intrinsic Plagiarism Detection
The present paper introduces the first corpus for the evaluation of Arabic intrinsic plagiarism detection. The corpus consists of 1024 artificial suspicious documents in which 2833 plagiarism cases have been inserted automatically from source documents.
متن کاملPlagiarism Detection In Arabic Scripts Using Fuzzy Information Retrieval
The nature of Arabic language structure exposes the need for fuzzy or vague concept to reveal dishonest practices in Arabic documents. In this paper, we present a statement-based plagiarism detection approach in Arabic scripts using fuzzy-set IR model. The degree of similarity is calculated and compared to a threshold value to judge whether two statements are the same or different. Our corpus c...
متن کاملDiscrepancies Detection in Arabic and English Documents
In the paper, there are analyzed and compared results of usable methods for discrepancies detection based on character n-gram profiles (the set of character n-gram normalized frequencies of a text) for English and Arabic documents. English and Arabic texts were analyzed from many statistical characteristics point of view. We covered some statistical differences between both languages and we app...
متن کاملPlagiarism Detection in Arabic Documents: Approaches, Architecture and
Plagiarism detection is a sensitive field of research which has gained lot of interest in the past few years. Although plagiarism detection systems are developed to check text in a variety of languages, they perform better when they are dedicated to check a specific language as they take into account the specificity of the language which leads to better quality results. Query optimization and d...
متن کامل